Online-Academy
Look, Read, Understand, Apply

Data Mining And Data Warehousing

Capacity Planning

Disk Space Requirement of Data Warehouse

Disk space required to set up data warehouse is calculated using size of following tables:
  • Fact Table, F
  • Dimension Table, D
  • Aggregate Table, A
  • Index Table for the Fact Table, Fi
  • Index Table for the Dimension Table, Di
  • Index Table for the Aggregate Table, Ai
  • Temporary Space
  • Sort Space
Required Disk space = F + Fi+A + Ai + D + Di + T + S
= 3F + Fi+ D + Di + T + Sas, F = A = Ai
= 3F + 0.3F + D + Di + T + Sas, Index of Fact is approximate 30% of Fact Table
= 3F + 0.3F+0.1F+0.1F + T + Sas D near equal to Di and D is 10% of Fact Table
Required Disk Space= 3.5F + T + S
Required Disk Space<= 3.5F + Tas S is less than T and F
Temporary Space is required to perform tasks like loading operational data, cleaning and transforming data. Temporary space requirement is calculated using number of concurrent queries, n, that is allowed to execute and size of largest partition, P.
Temporary Space, T = (2n+ 1)P
For example, let's suppose we have Fact table of size 30 GB and we have to keep 4 years data live. Total fact data is 30 * 4 = 120 GB
Disk Space Required= 3.5 F + T
= 3.5 * 120 + T
= 420 + T GB
T= (2n + 1 ) P, let n = 4 and P = 4GB
Disk Space Required= 420 + (2n+1)P
=420 +(2 * 4 + 1) * 4
=420 +36
=456 GB

Required CPU capacity

First, we have to find the time required to access the largest amount of data warehouse data. Fact table is the largest table in the data warehouse. So, using Fact Table (F), Scan rate of I/O device, we can calculate the time required to scan the whole Fact table, T = F/S --- (i)
Then we have to get the response time excepted by the users, Tp
Then we calculate required scan rate according to user's response time, Sp = F/Tp
After getting the required scan rate we can know how much CPU will be needed to satisfy that scan rate.